A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs
نویسندگان
چکیده
Today, the challenge is to exploit the parallelism available in the way of multi-core architectures by the software. This could be done by re-writing the application, by exploiting the hardware capabilities or expect the compiler/software runtime tools to do the job for us. With the advent of multi-core architectures ([1] [2]), this problem is becoming more and more relevant. Even today, there are not many run-time tools to analyze the behavioral pattern of such performance critical applications, and to re-compile them. So, techniques like OpenMP for shared memory programs are still useful in exploiting parallelism in the machine. This work tries to study if the loop parallelization (both with and without applying transformations) can be a good case for running scientific programs efficiently on such multi-core architectures. We have found the results to be encouraging and we strongly feel that this could lead to some good results if implemented fully in a production compiler for multi-core architectures.
منابع مشابه
Optimizing Sparse Matrix Vector Multiplication on SMPs
We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...
متن کاملScalable Automatic Parallelization of Irregular Reductions on Shared Memory Multiprocessors
This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scal-able shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further distributed across processors. Iterations belonging to the same set are chosen in such a way that update diierent entries in the reduction array. Tha...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملCombining building blocks for parallel multi-level matrix multiplication
EXTENDED ABSTRACT Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and ...
متن کاملOptimizing a multi-product closed-loop supply chain using NSGA-II, MOSA, and MOPSO meta-heuristic algorithms
This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010